Decision Trees in Practice

In this assignment we will explore various techniques for preventing overfitting in decision trees. We will extend the implementation of the binary decision trees that we implemented in the previous assignment. You will have to use your solutions from this previous assignment and extend them.

In this assignment you will:

  • Implement binary decision trees with different early stopping methods.
  • Compare models with different stopping parameters.
  • Visualize the concept of overfitting in decision trees.

Let's get started!

Fire up GraphLab Create

Make sure you have the latest version of GraphLab Create.

In [1]:
import graphlab as gl
print('gl.version: %s' % (gl.version))
gl.canvas.set_target('ipynb')
import math
import string

# my imports
import pandas as pd
import numpy as np
from six.moves import cPickle as pickle
gl.version: 1.8.4
In [2]:
from types import MethodType
def value_counts( self ):
    import pandas as pd
    pdDf = self.to_dataframe()
    for ftr in pdDf.columns:
        print(pdDf[ftr].value_counts())
        
#SFrame.value_counts = MethodType(value_counts, None, SFrame)
#setattr(SFrame, 'value_counts', value_counts)
#setattr(glbObsAll, 'value_counts', value_counts)

Load LendingClub Dataset

This assignment will use the LendingClub dataset used in the previous two assignments.

In [3]:
glbObsAll = gl.SFrame('data/lending-club-data.gl/')
2016-03-15 09:54:05,484 [INFO] graphlab.cython.cy_server, 176: GraphLab Create v1.8.4 started. Logging: /tmp/graphlab_server_1458050042.log
This non-commercial license of GraphLab Create is assigned to bbalaji8@gmail.com and will expire on December 09, 2016. For commercial licensing options, visit https://dato.com/buy/.
In [4]:
print(glbObsAll.shape)
glbObsAll.show()
print(glbObsAll)
(122607, 68)
+---------+-----------+-----------+-------------+-----------------+------------+
|    id   | member_id | loan_amnt | funded_amnt | funded_amnt_inv |    term    |
+---------+-----------+-----------+-------------+-----------------+------------+
| 1077501 |  1296599  |    5000   |     5000    |       4975      |  36 months |
| 1077430 |  1314167  |    2500   |     2500    |       2500      |  60 months |
| 1077175 |  1313524  |    2400   |     2400    |       2400      |  36 months |
| 1076863 |  1277178  |   10000   |    10000    |      10000      |  36 months |
| 1075269 |  1311441  |    5000   |     5000    |       5000      |  36 months |
| 1072053 |  1288686  |    3000   |     3000    |       3000      |  36 months |
| 1071795 |  1306957  |    5600   |     5600    |       5600      |  60 months |
| 1071570 |  1306721  |    5375   |     5375    |       5350      |  60 months |
| 1070078 |  1305201  |    6500   |     6500    |       6500      |  60 months |
| 1069908 |  1305008  |   12000   |    12000    |      12000      |  36 months |
+---------+-----------+-----------+-------------+-----------------+------------+
+----------+-------------+-------+-----------+-----------------------+------------+
| int_rate | installment | grade | sub_grade |       emp_title       | emp_length |
+----------+-------------+-------+-----------+-----------------------+------------+
|  10.65   |    162.87   |   B   |     B2    |                       | 10+ years  |
|  15.27   |    59.83    |   C   |     C4    |         Ryder         |  < 1 year  |
|  15.96   |    84.33    |   C   |     C5    |                       | 10+ years  |
|  13.49   |    339.31   |   C   |     C1    |  AIR RESOURCES BOARD  | 10+ years  |
|   7.9    |    156.46   |   A   |     A4    |  Veolia Transportaton |  3 years   |
|  18.64   |    109.43   |   E   |     E1    |    MKC Accounting     |  9 years   |
|  21.28   |    152.39   |   F   |     F2    |                       |  4 years   |
|  12.69   |    121.45   |   B   |     B5    |       Starbucks       |  < 1 year  |
|  14.65   |    153.45   |   C   |     C3    | Southwest Rural metro |  5 years   |
|  12.69   |    402.54   |   B   |     B5    |          UCLA         | 10+ years  |
+----------+-------------+-------+-----------+-----------------------+------------+
+----------------+------------+-----------------+-----------------+-------------+
| home_ownership | annual_inc |     is_inc_v    |     issue_d     | loan_status |
+----------------+------------+-----------------+-----------------+-------------+
|      RENT      |   24000    |     Verified    | 20111201T000000 |  Fully Paid |
|      RENT      |   30000    | Source Verified | 20111201T000000 | Charged Off |
|      RENT      |   12252    |   Not Verified  | 20111201T000000 |  Fully Paid |
|      RENT      |   49200    | Source Verified | 20111201T000000 |  Fully Paid |
|      RENT      |   36000    | Source Verified | 20111201T000000 |  Fully Paid |
|      RENT      |   48000    | Source Verified | 20111201T000000 |  Fully Paid |
|      OWN       |   40000    | Source Verified | 20111201T000000 | Charged Off |
|      RENT      |   15000    |     Verified    | 20111201T000000 | Charged Off |
|      OWN       |   72000    |   Not Verified  | 20111201T000000 |  Fully Paid |
|      OWN       |   75000    | Source Verified | 20111201T000000 |  Fully Paid |
+----------------+------------+-----------------+-----------------+-------------+
+------------+-------------------------------+-------------------------------+-----+
| pymnt_plan |              url              |              desc             | ... |
+------------+-------------------------------+-------------------------------+-----+
|     n      | https://www.lendingclub.co... |   Borrower added on 12/22/... | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/22/... | ... |
|     n      | https://www.lendingclub.co... |                               | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/21/... | ... |
|     n      | https://www.lendingclub.co... |                               | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/16/... | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/21/... | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/16/... | ... |
|     n      | https://www.lendingclub.co... |   Borrower added on 12/15/... | ... |
|     n      | https://www.lendingclub.co... |                               | ... |
+------------+-------------------------------+-------------------------------+-----+
[122607 rows x 68 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

As before, we reassign the labels to have +1 for a safe loan, and -1 for a risky (bad) loan.

In [5]:
# glbObsAll['safe_loan'] = glbObsAll['bad_glbObsAll'].apply(lambda x : +1 if x==0 else -1)
# glbObsAll = glbObsAll.remove_column('bad_glbObsAll')
glbObsAll['safe_loan'] = glbObsAll['bad_loans'].apply(lambda x : +1 if x==0 else -1)
glbObsAll['safe_loan'].show(view = 'Categorical')
In [6]:
glbObsAll = glbObsAll.remove_column('bad_loans')
glbObsAll.save('data/module-6-decision-tree-practical-assignment_glbObsAll.gl')

We will be using the same 4 categorical features as in the previous assignment:

  1. grade of the loan
  2. the length of the loan term
  3. the home ownership status: own, mortgage, rent
  4. number of years of employment.

In the dataset, each of these features is a categorical feature. Since we are building a binary decision tree, we will have to convert this to binary data in a subsequent section using 1-hot encoding.

In [8]:
features = ['grade',              # grade of the loan
            'term',               # the term of the loan
            'home_ownership',     # home_ownership status: own, mortgage or rent
            'emp_length',         # number of years of employment
           ]
target = 'safe_loan'
glbObsAll = glbObsAll[features + [target]]
In [9]:
print(glbObsAll.shape)
glbObsAll.show()
glbObsAll
(122607, 5)
Out[9]:
grade term home_ownership emp_length safe_loan
B 36 months RENT 10+ years 1
C 60 months RENT < 1 year -1
C 36 months RENT 10+ years 1
C 36 months RENT 10+ years 1
A 36 months RENT 3 years 1
E 36 months RENT 9 years 1
F 60 months OWN 4 years -1
B 60 months RENT < 1 year -1
C 60 months OWN 5 years 1
B 36 months OWN 10+ years 1
[122607 rows x 5 columns]
Note: Only the head of the SFrame is printed.
You can use print_rows(num_rows=m, num_columns=n) to print more rows and columns.

Subsample dataset to make sure classes are balanced

Just as we did in the previous assignment, we will undersample the larger class (safe glbObsAll) in order to balance out our dataset. This means we are throwing away many data points. We used seed = 1 so everyone gets the same results.

In [ ]:
# safe_loan_raw = glbObsAll[glbObsAll[target] == 1]
# risky_glbObsAll_raw = glbObsAll[glbObsAll[target] == -1]

# # Since there are less risky glbObsAll than safe glbObsAll, find the ratio of the sizes
# # and use that percentage to undersample the safe glbObsAll.
# percentage = len(risky_glbObsAll_raw)/float(len(safe_loan_raw))
# safe_loan = safe_loan_raw.sample(percentage, seed = 1)
# risky_glbObsAll = risky_glbObsAll_raw
# glbObsSmp = risky_glbObsAll.append(safe_loan)

# print "Percentage of safe glbObsAll                 :", len(safe_loan) / float(len(glbObsSmp))
# print "Percentage of risky glbObsAll                :", len(risky_glbObsAll) / float(len(glbObsSmp))
# print "Total number of glbObsAll in our new dataset :", len(glbObsSmp)
In [10]:
glbObsSfe = glbObsAll[glbObsAll[target] == +1]
glbObsRsk = glbObsAll[glbObsAll[target] == -1]

# Since there are less risky loans than safe loans, find the ratio of the sizes
# and use that percentage to undersample the safe loans.
percentage = len(glbObsRsk)/float(len(glbObsSfe))
glbObsSfeSmp = glbObsSfe.sample(percentage, seed = 1)
#risky_glbObsAll = glbObsRsk
glbObsSmp = glbObsRsk.append(glbObsSfeSmp)

print "Percentage of safe  loans                : %.2f%%" % \
        (len(glbObsSfeSmp) * 100.0 / float(len(glbObsSmp)))
print "Percentage of risky loans                : %.2f%%" % \
        (len(glbObsRsk)    * 100.0 / float(len(glbObsSmp)))
print "Total number of loans in our new dataset :", len(glbObsSmp)
Percentage of safe  loans                : 50.22%
Percentage of risky loans                : 49.78%
Total number of loans in our new dataset : 46508

Note: There are many approaches for dealing with imbalanced data, including some where we modify the learning algorithm. These approaches are beyond the scope of this course, but some of them are reviewed in this paper. For this assignment, we use the simplest possible approach, where we subsample the overly represented class to get a more balanced dataset. In general, and especially when the data is highly imbalanced, we recommend using more advanced methods.

Transform categorical data into binary features

Since we are implementing binary decision trees, we transform our categorical data into binary data using 1-hot encoding, just as in the previous assignment. Here is the summary of that discussion:

For instance, the home_ownership feature represents the home ownership status of the loanee, which is either own, mortgage or rent. For example, if a data point has the feature

   {'home_ownership': 'RENT'}

we want to turn this into three features:

 { 
   'home_ownership = OWN'      : 0, 
   'home_ownership = MORTGAGE' : 0, 
   'home_ownership = RENT'     : 1
 }

Since this code requires a few Python and GraphLab tricks, feel free to use this block of code as is. Refer to the API documentation for a deeper understanding.

In [ ]:
#glbObsSmp = risky_glbObsAll.append(safe_loan)
# for feature in features:
#     glbObsSmp_one_hot_encoded = glbObsSmp[feature].apply(lambda x: {x: 1})    
#     glbObsSmp_unpacked = glbObsSmp_one_hot_encoded.unpack(column_name_prefix=feature)
    
#     # Change None's to 0's
#     for column in glbObsSmp_unpacked.column_names():
#         glbObsSmp_unpacked[column] = glbObsSmp_unpacked[column].fillna(0)

#     glbObsSmp.remove_column(feature)
#     glbObsSmp.add_columns(glbObsSmp_unpacked)
In [11]:
#glbObsSmp = risky_glbObsAll.append(glbObsSfeSmp)
for feature in features:
    glbObsSmp_one_hot_encoded = glbObsSmp[feature].apply(lambda x: {x: 1})    
    glbObsSmp_unpacked = glbObsSmp_one_hot_encoded.unpack(column_name_prefix=feature)
    
    # Change None's to 0's
    for column in glbObsSmp_unpacked.column_names():
        glbObsSmp_unpacked[column] = glbObsSmp_unpacked[column].fillna(0)

    glbObsSmp.remove_column(feature)
    glbObsSmp.add_columns(glbObsSmp_unpacked)
In [12]:
glbObsSmp.save('data/module-6-decision-tree-practical-assignment_glbObsSmp.gl')

The feature columns now look like this:

In [13]:
features = glbObsSmp.column_names()
features.remove('safe_loan')  # Remove the response variable
features
Out[13]:
['grade.A',
 'grade.B',
 'grade.C',
 'grade.D',
 'grade.E',
 'grade.F',
 'grade.G',
 'term. 36 months',
 'term. 60 months',
 'home_ownership.MORTGAGE',
 'home_ownership.OTHER',
 'home_ownership.OWN',
 'home_ownership.RENT',
 'emp_length.1 year',
 'emp_length.10+ years',
 'emp_length.2 years',
 'emp_length.3 years',
 'emp_length.4 years',
 'emp_length.5 years',
 'emp_length.6 years',
 'emp_length.7 years',
 'emp_length.8 years',
 'emp_length.9 years',
 'emp_length.< 1 year',
 'emp_length.n/a']

Train-Validation split

We split the data into a train-validation split with 80% of the data in the training set and 20% of the data in the validation set. We use seed=1 so that everyone gets the same result.

In [14]:
#glbObsFit, glbObsOOB = glbObsSmp.random_split(.8, seed=1)
glbObsFit, glbObsOOB = glbObsSmp.random_split(.8, seed=1)
print(glbObsFit.shape)
print(glbObsOOB.shape)
(37224, 26)
(9284, 26)

Early stopping methods for decision trees

In this section, we will extend the binary tree implementation from the previous assignment in order to handle some early stopping conditions. Recall the 3 early stopping methods that were discussed in lecture:

  1. Reached a maximum depth. (set by parameter depthMax).
  2. Reached a minimum node size. (set by parameter nodeSizeMin).
  3. Don't split if the gain in error reduction is too small. (set by parameter errReductionMin).

For the rest of this assignment, we will refer to these three as early stopping conditions 1, 2, and 3.

Early stopping condition 1: Maximum depth

Recall that we already implemented the maximum depth stopping condition in the previous assignment. In this assignment, we will experiment with this condition a bit more and also write code to implement the 2nd and 3rd early stopping conditions.

We will be reusing code from the previous assignment and then building upon this. We will alert you when you reach a function that was part of the previous assignment so that you can simply copy and past your previous code.

Early stopping condition 2: Minimum node size

The function isReachedNodeSizeMin takes 2 arguments:

  1. The data (from a node)
  2. The minimum number of data points that a node is allowed to split on, nodeSizeMin.

This function simply calculates whether the number of data points at a given node is less than or equal to the specified minimum node size. This function will be used to detect this early stopping condition in the bldDecisionTree function.

Fill in the parts of the function below where you find ## YOUR CODE HERE. There is one instance in the function below.

In [15]:
def isReachedNodeSizeMin(data, nodeSizeMin):
    # Return True if the number of data points is less than or equal to the minimum node size.
    ## YOUR CODE HERE
    return(data.shape[0] <= nodeSizeMin)
    

Quiz question: Given an intermediate node with 6 safe loans and 3 risky loans, if the nodeSizeMin parameter is 10, what should the tree learning algorithm do next?

Early stopping condition 3: Minimum gain in error reduction

The function getErrReduction takes 2 arguments:

  1. The error before a split, errSplitBfr.
  2. The error after a split, errSplitAfr.

This function computes the gain in error reduction, i.e., the difference between the error before the split and that after the split. This function will be used to detect this early stopping condition in the bldDecisionTree function.

Fill in the parts of the function below where you find ## YOUR CODE HERE. There is one instance in the function below.

In [16]:
def getErrReduction(errSplitBfr, errSplitAfr):
    # Return the error before the split minus the error after the split.
    ## YOUR CODE HERE
    return(errSplitBfr - errSplitAfr)

Quiz question: Assume an intermediate node has 6 safe loans and 3 risky loans. For each of 4 possible features to split on, the error reduction is 0.0, 0.05, 0.1, and 0.14, respectively. If the minimum gain in error reduction parameter is set to 0.2, what should the tree learning algorithm do next?

Grabbing binary decision tree helper functions from past assignment

Recall from the previous assignment that we wrote a function getNMistakesIntermediateNode that calculates the number of misclassified examples when predicting the majority class. This is used to help determine which feature is best to split on at a given node of the tree.

Please copy and paste your code for getNMistakesIntermediateNode here.

In [17]:
def getNMistakesIntermediateNode(labelsNode):
    # Corner case: If labelsNode is empty, return 0
    if len(labelsNode) == 0:
        return 0
    
    # Count the number of 1's (safe loans)
    ## YOUR CODE HERE
    nSfeLoans = len(labelsNode[labelsNode == +1])
    
    # Count the number of -1's (risky loans)
    ## YOUR CODE HERE
    nRskLoans = len(labelsNode[labelsNode == -1])    
                
    # Return the number of mistakes that the majority classifier makes.
    ## YOUR CODE HERE
    if (nSfeLoans >= nRskLoans):
        nMistakes = nRskLoans
    else:
        nMistakes = nSfeLoans
        
    return(nMistakes)

We then wrote a function getFeatureSplitBest that finds the best feature to split on given the data and a list of features to consider.

Please copy and paste your getFeatureSplitBest code here.

In [18]:
def getFeatureSplitBest(data, features, target):
    
    featureBest = None # Keep track of the best feature 
    errorBest = 10     # Keep track of the best error so far 
    # Note: Since error is always <= 1, we should intialize it with something larger than 1.

    # Convert to float to make sure error gets computed correctly.
    nObs = float(len(data))  
    
    # Loop through each feature to consider splitting on that feature
    for feature in features:
        
        # The left split will have all data points where the feature value is 0
        splitLft = data[data[feature] == 0]
        
        # The right split will have all data points where the feature value is 1
        ## YOUR CODE HERE
        splitRgt = data[data[feature] != 0] 
            
        # Calculate the number of misclassified examples in the left split.
        # Remember that we implemented a function for this! 
        #   (It was called getNMistakesIntermediateNode)
        # YOUR CODE HERE
        nMistakesLft = getNMistakesIntermediateNode(splitLft[target])             

        # Calculate the number of misclassified examples in the right split.
        ## YOUR CODE HERE
        nMistakesRgt = getNMistakesIntermediateNode(splitRgt[target])
            
        # Compute the classification error of this split.
        # Error = (# of mistakes (left) + # of mistakes (right)) / (# of data points)
        ## YOUR CODE HERE
        error = (nMistakesLft + nMistakesRgt) / nObs

        # If this is the best error we have found so far, 
        #   store the feature as featureBest and the error as errorBest
        ## YOUR CODE HERE
        if error < errorBest:
            featureBest = feature
            errorBest = error
    
    return featureBest # Return the best feature we found

Finally, recall the function bldLeaf from the previous assignment, which creates a leaf node given a set of target values.

Please copy and paste your bldLeaf code here.

In [20]:
def bldLeaf(targetVctr):
    
    # Create a leaf node
    leaf = {'featureSplit' : None,
            'lft'          : None,
            'rht'          : None,
            'isLeaf'       : True}   ## YOUR CODE HERE
    
    # Count the number of data points that are +1 and -1 in this node.
    nPls = len(targetVctr[targetVctr == +1])
    nMns = len(targetVctr[targetVctr == -1])
    
    # For the leaf node, set the prediction to be the majority class.
    # Store the predicted class (1 or -1) in leaf['prediction']
    if nPls > nMns:
        leaf['prediction'] = +1         ## YOUR CODE HERE
    else:
        leaf['prediction'] = -1         ## YOUR CODE HERE
        
    # Return the leaf node        
    return leaf 

Incorporating new early stopping conditions in binary decision tree implementation

Now, you will implement a function that builds a decision tree handling the three early stopping conditions described in this assignment. In particular, you will write code to detect early stopping conditions 2 and 3. You implemented above the functions needed to detect these conditions. The 1st early stopping condition, depthMax, was implemented in the previous assigment and you will not need to reimplement this. In addition to these early stopping conditions, the typical stopping conditions of having no mistakes or no more features to split on (which we denote by "stopping conditions" 1 and 2) are also included as in the previous assignment.

Implementing early stopping condition 2: minimum node size:

  • Step 1: Use the function isReachedNodeSizeMin that you implemented earlier to write an if condition to detect whether we have hit the base case, i.e., the node does not have enough data points and should be turned into a leaf. Don't forget to use the nodeSizeMin argument.
  • Step 2: Return a leaf. This line of code should be the same as the other (pre-implemented) stopping conditions.

Implementing early stopping condition 3: minimum error reduction:

Note: This has to come after finding the best splitting feature so we can calculate the error after splitting in order to calculate the error reduction.

  • Step 1: Calculate the classification error before splitting. Recall that classification error is defined as:
$$ \text{classification error} = \frac{\text{# mistakes}}{\text{# total examples}} $$
  • Step 2: Calculate the classification error after splitting. This requires calculating the number of mistakes in the left and right splits, and then dividing by the total number of examples.
  • Step 3: Use the function getErrReduction to that you implemented earlier to write an if condition to detect whether the reduction in error is less than the constant provided (errReductionMin). Don't forget to use that argument.
  • Step 4: Return a leaf. This line of code should be the same as the other (pre-implemented) stopping conditions.

Fill in the places where you find ## YOUR CODE HERE. There are seven places in this function for you to fill in.

In [22]:
def bldDecisionTree(data, features, target, 
                    depthCur = 0, depthMax = 10, 
                    nodeSizeMin = 1, 
                    errReductionMin = 0.0):
    
    featureRemain = features[:] # Make a copy of the features.
    
    targetVctr = data[target]
    print "--------------------------------------------------------------------"
    print "Subtree, depth = %s (%s data points)." % (depthCur, len(targetVctr))
    
    
    # Stopping condition 1: All nodes are of the same type.
    if getNMistakesIntermediateNode(targetVctr) == 0:
        print "Stopping condition 1 reached. All data points have the same target value."                
        return bldLeaf(targetVctr)
    
    # Stopping condition 2: No more features to split on.
    if featureRemain == []:
        print "Stopping condition 2 reached. No remaining features."                
        return bldLeaf(targetVctr)    
    
    # Early stopping condition 1: Reached max depth limit.
    if depthCur >= depthMax:
        print "Early stopping condition 1 reached. Reached maximum depth."
        return bldLeaf(targetVctr)
    
    # Early stopping condition 2: Reached the minimum node size.
    # If the number of data points is less than or equal to the minimum size, return a leaf.
    if (data.shape[0] <= nodeSizeMin):          ## YOUR CODE HERE 
        print "Early stopping condition 2 reached. Reached minimum node size."
        return bldLeaf(targetVctr)  ## YOUR CODE HERE
    
    # Find the best splitting feature
    featureSplitBest = getFeatureSplitBest(data, features, target)
    
    # Split on the best feature that we found. 
    splitLft = data[data[featureSplitBest] == 0]
    splitRgt = data[data[featureSplitBest] != 0]
    
    # Early stopping condition 3: Minimum error reduction
    # Calculate the error before splitting (number of misclassified examples 
    # divided by the total number of examples)
    errSplitBfr = getNMistakesIntermediateNode(targetVctr) / float(len(data))
    
    # Calculate the error after splitting (number of misclassified examples 
    # in both groups divided by the total number of examples)
    mistakesLft = getNMistakesIntermediateNode(splitLft[target]) ## YOUR CODE HERE
    mistakesRgt = getNMistakesIntermediateNode(splitRgt[target]) ## YOUR CODE HERE
    errSplitAfr = (mistakesLft + mistakesRgt) / float(len(data))
    
    # If the error reduction is LESS THAN OR EQUAL TO errReductionMin, return a leaf.
    if getErrReduction(errSplitBfr, errSplitAfr) <= errReductionMin:        ## YOUR CODE HERE
        print "Early stopping condition 3 reached. Minimum error reduction."
        return bldLeaf(targetVctr)  ## YOUR CODE HERE 
    
    
    featureRemain.remove(featureSplitBest)
    print "Split on feature %s. (%s, %s)" % (\
                      featureSplitBest, len(splitLft), len(splitRgt))
    
    
    # Repeat (recurse) on left and right subtrees
    treeLft = bldDecisionTree(splitLft, featureRemain, target, 
                                     depthCur + 1, depthMax, nodeSizeMin, errReductionMin)        
    
    ## YOUR CODE HERE
    treeRgt = bldDecisionTree(splitRgt, featureRemain, target, 
                                     depthCur + 1, depthMax, nodeSizeMin, errReductionMin)        
    
    
    return {'isLeaf'        : False, 
            'prediction'    : None,
            'featureSplit'  : featureSplitBest,
            'lft'           : treeLft, 
            'rgt'           : treeRgt}

Here is a function to count the nodes in your tree:

In [23]:
def getNNodes(tree):
    if tree['isLeaf']:
        return 1
    return 1 + getNNodes(tree['lft']) + getNNodes(tree['rgt'])

Run the following test code to check your implementation. Make sure you get 'Test passed' before proceeding.

In [25]:
smallDTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 2, 
                                        nodeSizeMin = 10, errReductionMin = 0.0)
if getNNodes(smallDTree) == 7:
    print 'Test passed!'
else:
    print 'Test failed... try again!'
    print 'Number of nodes found                :', getNNodes(smallDTree)
    print 'Number of nodes that should be there : 5' 
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
Test passed!

Build a tree!

Now that your code is working, we will train a tree model on the glbObsFit with

  • depthMax = 6
  • nodeSizeMin = 100,
  • errReductionMin = 0.0

Warning: This code block may take a minute to learn.

In [26]:
newMyDTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 6, 
                                nodeSizeMin = 100, errReductionMin = 0.0)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Early stopping condition 3 reached. Minimum error reduction.

Let's now train a tree model ignoring early stopping conditions 2 and 3 so that we get the same tree as in the previous assignment. To ignore these conditions, we set nodeSizeMin=0 and errReductionMin=-1 (a negative value).

In [27]:
oldMyDTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 6, 
                                nodeSizeMin = 0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
--------------------------------------------------------------------
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
--------------------------------------------------------------------
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.

Making predictions

Recall that in the previous assignment you implemented a function classifyDTree to classifyDTree a new point x using a given tree.

Please copy and paste your classifyDTree code here.

In [28]:
def classifyDTree(tree, x, annotate = False):   
    # if the node is a leaf node.
    if tree['isLeaf']:
        if annotate: 
            print "At leaf, predicting %s" % tree['prediction']
        return tree['prediction'] 
    else:
        # split on feature.
        featureSplitVal = x[tree['featureSplit']]
        if annotate: 
            print "Split on %s = %s" % (tree['featureSplit'], featureSplitVal)
        if featureSplitVal == 0:
            return classifyDTree(tree['lft' ], x, annotate)
        else:
            return classifyDTree(tree['rgt'], x, annotate)
               ### YOUR CODE HERE                

Now, let's consider the first example of the validation set and see what the newMyDTree model predicts for this data point.

In [29]:
glbObsOOB[0]
Out[29]:
{'emp_length.1 year': 0,
 'emp_length.10+ years': 0,
 'emp_length.2 years': 1,
 'emp_length.3 years': 0,
 'emp_length.4 years': 0,
 'emp_length.5 years': 0,
 'emp_length.6 years': 0,
 'emp_length.7 years': 0,
 'emp_length.8 years': 0,
 'emp_length.9 years': 0,
 'emp_length.< 1 year': 0,
 'emp_length.n/a': 0,
 'grade.A': 0,
 'grade.B': 0,
 'grade.C': 0,
 'grade.D': 1,
 'grade.E': 0,
 'grade.F': 0,
 'grade.G': 0,
 'home_ownership.MORTGAGE': 0,
 'home_ownership.OTHER': 0,
 'home_ownership.OWN': 0,
 'home_ownership.RENT': 1,
 'safe_loan': -1,
 'term. 36 months': 0,
 'term. 60 months': 1}
In [30]:
print 'Predicted class: %s ' % classifyDTree(newMyDTree, glbObsOOB[0])
Predicted class: -1 

Let's add some annotations to our prediction to see what the prediction path was that lead to this predicted class:

In [31]:
classifyDTree(newMyDTree, glbObsOOB[0], annotate = True)
Split on term. 36 months = 0
Split on grade.A = 0
At leaf, predicting -1
Out[31]:
-1

Let's now recall the prediction path for the decision tree learned in the previous assignment, which we recreated here as oldMyDTree.

In [32]:
classifyDTree(oldMyDTree, glbObsOOB[0], annotate = True)
Split on term. 36 months = 0
Split on grade.A = 0
Split on grade.B = 0
Split on grade.C = 0
Split on grade.D = 1
Split on grade.E = 0
At leaf, predicting -1
Out[32]:
-1

Quiz question: For newMyDTree trained with depthMax = 6, nodeSizeMin = 100, errReductionMin=0.0, is the prediction path for glbObsOOB[0] shorter, longer, or the same as for oldMyDTree that ignored the early stopping conditions 2 and 3?

Quiz question: For newMyDTree trained with depthMax = 6, nodeSizeMin = 100, errReductionMin=0.0, is the prediction path for any point always shorter, always longer, always the same, shorter or the same, or longer or the same as for oldMyDTree that ignored the early stopping conditions 2 and 3?

Quiz question: For a tree trained on any dataset using depthMax = 6, nodeSizeMin = 100, errReductionMin=0.0, what is the maximum number of splits encountered while making a single prediction?

Evaluating the model

Now let us evaluate the model that we have trained. You implemented this evautation in the function evlClassificationError from the previous assignment.

Please copy and paste your evlClassificationError code here.

In [33]:
def evlClassificationError(tree, data):
    # Apply the classifyDTree(tree, x) to each row in your data
    prediction = data.apply(lambda x: classifyDTree(tree, x))
    
    # Once you've made the predictions, calculate the classification error and return it
    ## YOUR CODE HERE
    return(data[data[target] != prediction].shape[0] * 1.0 / data.shape[0])    

Now, let's use this function to evaluate the classification error of newMyDTree on the glbObsOOB.

In [34]:
evlClassificationError(newMyDTree, glbObsOOB)
Out[34]:
0.38367083153813014

Now, evaluate the validation error using oldMyDTree.

In [35]:
evlClassificationError(oldMyDTree, glbObsOOB)
Out[35]:
0.3837785437311504

Quiz question: Is the validation error of the new decision tree (using early stopping conditions 2 and 3) lower than, higher than, or the same as that of the old decision tree from the previous assignment?

Exploring the effect of depthMax

We will compare three models trained with different values of the stopping criterion. We intentionally picked models at the extreme ends (too small, just right, and too large).

Train three models with these parameters:

  1. depthMax02DTree: depthMax = 2 (too small)
  2. depthMax06DTree: depthMax = 6 (just right)
  3. depthMax14DTree: depthMax = 14 (may be too large)

For each of these three, we set nodeSizeMin = 0 and errReductionMin = -1.

Note: Each tree can take up to a few minutes to train. In particular, depthMax14DTree will probably take the longest to train.

In [36]:
depthMax02DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 02, 
                                    nodeSizeMin = 0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
In [37]:
depthMax06DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
--------------------------------------------------------------------
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
--------------------------------------------------------------------
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
In [38]:
depthMax14DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 14, 
                                    nodeSizeMin = 0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Split on feature home_ownership.OTHER. (1692, 1)
--------------------------------------------------------------------
Subtree, depth = 7 (1692 data points).
Split on feature grade.F. (339, 1353)
--------------------------------------------------------------------
Subtree, depth = 8 (339 data points).
Split on feature grade.G. (0, 339)
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (339 data points).
Split on feature term. 60 months. (0, 339)
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (339 data points).
Split on feature home_ownership.MORTGAGE. (175, 164)
--------------------------------------------------------------------
Subtree, depth = 11 (175 data points).
Split on feature home_ownership.OWN. (142, 33)
--------------------------------------------------------------------
Subtree, depth = 12 (142 data points).
Split on feature emp_length.6 years. (133, 9)
--------------------------------------------------------------------
Subtree, depth = 13 (133 data points).
Split on feature home_ownership.RENT. (0, 133)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (9 data points).
Split on feature home_ownership.RENT. (0, 9)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (33 data points).
Split on feature emp_length.n/a. (31, 2)
--------------------------------------------------------------------
Subtree, depth = 13 (31 data points).
Split on feature emp_length.2 years. (30, 1)
--------------------------------------------------------------------
Subtree, depth = 14 (30 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (164 data points).
Split on feature emp_length.2 years. (159, 5)
--------------------------------------------------------------------
Subtree, depth = 12 (159 data points).
Split on feature emp_length.3 years. (148, 11)
--------------------------------------------------------------------
Subtree, depth = 13 (148 data points).
Split on feature home_ownership.OWN. (148, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (148 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (11 data points).
Split on feature home_ownership.OWN. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (5 data points).
Split on feature home_ownership.OWN. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (5 data points).
Split on feature home_ownership.RENT. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (1353 data points).
Split on feature grade.G. (1353, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (1353 data points).
Split on feature term. 60 months. (0, 1353)
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1353 data points).
Split on feature home_ownership.MORTGAGE. (710, 643)
--------------------------------------------------------------------
Subtree, depth = 11 (710 data points).
Split on feature home_ownership.OWN. (602, 108)
--------------------------------------------------------------------
Subtree, depth = 12 (602 data points).
Split on feature home_ownership.RENT. (0, 602)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (602 data points).
Split on feature emp_length.1 year. (565, 37)
--------------------------------------------------------------------
Subtree, depth = 14 (565 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (37 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (108 data points).
Split on feature home_ownership.RENT. (108, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (108 data points).
Split on feature emp_length.1 year. (100, 8)
--------------------------------------------------------------------
Subtree, depth = 14 (100 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (643 data points).
Split on feature home_ownership.OWN. (643, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (643 data points).
Split on feature home_ownership.RENT. (643, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (643 data points).
Split on feature emp_length.1 year. (602, 41)
--------------------------------------------------------------------
Subtree, depth = 14 (602 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (41 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Split on feature grade.F. (2133, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (2133 data points).
Split on feature grade.G. (2133, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (2133 data points).
Split on feature term. 60 months. (0, 2133)
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (2133 data points).
Split on feature home_ownership.MORTGAGE. (1045, 1088)
--------------------------------------------------------------------
Subtree, depth = 10 (1045 data points).
Split on feature home_ownership.OTHER. (1044, 1)
--------------------------------------------------------------------
Subtree, depth = 11 (1044 data points).
Split on feature home_ownership.OWN. (879, 165)
--------------------------------------------------------------------
Subtree, depth = 12 (879 data points).
Split on feature home_ownership.RENT. (0, 879)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (879 data points).
Split on feature emp_length.1 year. (809, 70)
--------------------------------------------------------------------
Subtree, depth = 14 (809 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (70 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (165 data points).
Split on feature emp_length.9 years. (157, 8)
--------------------------------------------------------------------
Subtree, depth = 13 (157 data points).
Split on feature home_ownership.RENT. (157, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (157 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.RENT. (8, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1088 data points).
Split on feature home_ownership.OTHER. (1088, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (1088 data points).
Split on feature home_ownership.OWN. (1088, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (1088 data points).
Split on feature home_ownership.RENT. (1088, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (1088 data points).
Split on feature emp_length.1 year. (1035, 53)
--------------------------------------------------------------------
Subtree, depth = 14 (1035 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (53 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Split on feature grade.F. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (2058 data points).
Split on feature grade.G. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (2058 data points).
Split on feature term. 60 months. (0, 2058)
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (2058 data points).
Split on feature home_ownership.MORTGAGE. (923, 1135)
--------------------------------------------------------------------
Subtree, depth = 10 (923 data points).
Split on feature home_ownership.OTHER. (922, 1)
--------------------------------------------------------------------
Subtree, depth = 11 (922 data points).
Split on feature home_ownership.OWN. (762, 160)
--------------------------------------------------------------------
Subtree, depth = 12 (762 data points).
Split on feature home_ownership.RENT. (0, 762)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (762 data points).
Split on feature emp_length.1 year. (704, 58)
--------------------------------------------------------------------
Subtree, depth = 14 (704 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (58 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (160 data points).
Split on feature home_ownership.RENT. (160, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (160 data points).
Split on feature emp_length.1 year. (154, 6)
--------------------------------------------------------------------
Subtree, depth = 14 (154 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1135 data points).
Split on feature home_ownership.OTHER. (1135, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (1135 data points).
Split on feature home_ownership.OWN. (1135, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (1135 data points).
Split on feature home_ownership.RENT. (1135, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (1135 data points).
Split on feature emp_length.1 year. (1096, 39)
--------------------------------------------------------------------
Subtree, depth = 14 (1096 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (39 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Split on feature grade.F. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (2190 data points).
Split on feature grade.G. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (2190 data points).
Split on feature term. 60 months. (0, 2190)
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (2190 data points).
Split on feature home_ownership.MORTGAGE. (803, 1387)
--------------------------------------------------------------------
Subtree, depth = 10 (803 data points).
Split on feature emp_length.4 years. (746, 57)
--------------------------------------------------------------------
Subtree, depth = 11 (746 data points).
Split on feature home_ownership.OTHER. (746, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (746 data points).
Split on feature home_ownership.OWN. (598, 148)
--------------------------------------------------------------------
Subtree, depth = 13 (598 data points).
Split on feature home_ownership.RENT. (0, 598)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (598 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (148 data points).
Split on feature emp_length.< 1 year. (137, 11)
--------------------------------------------------------------------
Subtree, depth = 14 (137 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (57 data points).
Split on feature home_ownership.OTHER. (57, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (57 data points).
Split on feature home_ownership.OWN. (49, 8)
--------------------------------------------------------------------
Subtree, depth = 13 (49 data points).
Split on feature home_ownership.RENT. (0, 49)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (49 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.RENT. (8, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (8 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1387 data points).
Split on feature emp_length.6 years. (1313, 74)
--------------------------------------------------------------------
Subtree, depth = 11 (1313 data points).
Split on feature home_ownership.OTHER. (1313, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (1313 data points).
Split on feature home_ownership.OWN. (1313, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (1313 data points).
Split on feature home_ownership.RENT. (1313, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (1313 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (74 data points).
Split on feature home_ownership.OTHER. (74, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (74 data points).
Split on feature home_ownership.OWN. (74, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (74 data points).
Split on feature home_ownership.RENT. (74, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (74 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
--------------------------------------------------------------------
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (969 data points).
Split on feature grade.E. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (969 data points).
Split on feature grade.F. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (969 data points).
Split on feature grade.G. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (969 data points).
Split on feature term. 60 months. (0, 969)
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (969 data points).
Split on feature home_ownership.MORTGAGE. (367, 602)
--------------------------------------------------------------------
Subtree, depth = 11 (367 data points).
Split on feature home_ownership.OTHER. (367, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (367 data points).
Split on feature home_ownership.OWN. (291, 76)
--------------------------------------------------------------------
Subtree, depth = 13 (291 data points).
Split on feature home_ownership.RENT. (0, 291)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (291 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (76 data points).
Split on feature emp_length.9 years. (71, 5)
--------------------------------------------------------------------
Subtree, depth = 14 (71 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (602 data points).
Split on feature emp_length.9 years. (580, 22)
--------------------------------------------------------------------
Subtree, depth = 12 (580 data points).
Split on feature emp_length.3 years. (545, 35)
--------------------------------------------------------------------
Subtree, depth = 13 (545 data points).
Split on feature emp_length.4 years. (506, 39)
--------------------------------------------------------------------
Subtree, depth = 14 (506 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (39 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (35 data points).
Split on feature home_ownership.OTHER. (35, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (35 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (22 data points).
Split on feature home_ownership.OTHER. (22, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (22 data points).
Split on feature home_ownership.OWN. (22, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (22 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
--------------------------------------------------------------------
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (34 data points).
Split on feature grade.D. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (34 data points).
Split on feature grade.E. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (34 data points).
Split on feature grade.F. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (34 data points).
Split on feature grade.G. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (34 data points).
Split on feature term. 60 months. (0, 34)
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (34 data points).
Split on feature home_ownership.OTHER. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (34 data points).
Split on feature home_ownership.OWN. (25, 9)
--------------------------------------------------------------------
Subtree, depth = 13 (25 data points).
Split on feature home_ownership.RENT. (0, 25)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (25 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (9 data points).
Split on feature home_ownership.RENT. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (45 data points).
Split on feature grade.D. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (45 data points).
Split on feature grade.E. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (45 data points).
Split on feature grade.F. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (45 data points).
Split on feature grade.G. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (45 data points).
Split on feature term. 60 months. (0, 45)
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (45 data points).
Split on feature home_ownership.OTHER. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (45 data points).
Split on feature home_ownership.OWN. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (45 data points).
Split on feature home_ownership.RENT. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (85 data points).
Split on feature grade.D. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (85 data points).
Split on feature grade.E. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (85 data points).
Split on feature grade.F. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (85 data points).
Split on feature grade.G. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (85 data points).
Split on feature term. 60 months. (0, 85)
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (85 data points).
Split on feature home_ownership.MORTGAGE. (26, 59)
--------------------------------------------------------------------
Subtree, depth = 12 (26 data points).
Split on feature emp_length.3 years. (24, 2)
--------------------------------------------------------------------
Subtree, depth = 13 (24 data points).
Split on feature home_ownership.OTHER. (24, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (24 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (59 data points).
Split on feature home_ownership.OTHER. (59, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (59 data points).
Split on feature home_ownership.OWN. (59, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (59 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (11 data points).
Split on feature grade.D. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (11 data points).
Split on feature grade.E. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (11 data points).
Split on feature grade.F. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (11 data points).
Split on feature grade.G. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (11 data points).
Split on feature term. 60 months. (0, 11)
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (11 data points).
Split on feature home_ownership.MORTGAGE. (8, 3)
--------------------------------------------------------------------
Subtree, depth = 12 (8 data points).
Split on feature home_ownership.OTHER. (8, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (8 data points).
Split on feature home_ownership.OWN. (6, 2)
--------------------------------------------------------------------
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (2 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (3 data points).
Split on feature home_ownership.OTHER. (3, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (3 data points).
Split on feature home_ownership.OWN. (3, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (3 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (5 data points).
Split on feature grade.E. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (5 data points).
Split on feature grade.F. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (5 data points).
Split on feature grade.G. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (5 data points).
Split on feature term. 60 months. (0, 5)
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (5 data points).
Split on feature home_ownership.MORTGAGE. (2, 3)
--------------------------------------------------------------------
Subtree, depth = 11 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (3 data points).
Split on feature home_ownership.OTHER. (3, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (3 data points).
Split on feature home_ownership.OWN. (3, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (3 data points).
Split on feature home_ownership.RENT. (3, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (3 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Split on feature grade.A. (15839, 4799)
--------------------------------------------------------------------
Subtree, depth = 7 (15839 data points).
Split on feature home_ownership.OTHER. (15811, 28)
--------------------------------------------------------------------
Subtree, depth = 8 (15811 data points).
Split on feature grade.B. (6894, 8917)
--------------------------------------------------------------------
Subtree, depth = 9 (6894 data points).
Split on feature home_ownership.MORTGAGE. (4102, 2792)
--------------------------------------------------------------------
Subtree, depth = 10 (4102 data points).
Split on feature emp_length.4 years. (3768, 334)
--------------------------------------------------------------------
Subtree, depth = 11 (3768 data points).
Split on feature emp_length.9 years. (3639, 129)
--------------------------------------------------------------------
Subtree, depth = 12 (3639 data points).
Split on feature emp_length.2 years. (3123, 516)
--------------------------------------------------------------------
Subtree, depth = 13 (3123 data points).
Split on feature grade.C. (0, 3123)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (3123 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (516 data points).
Split on feature home_ownership.OWN. (458, 58)
--------------------------------------------------------------------
Subtree, depth = 14 (458 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (58 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (129 data points).
Split on feature home_ownership.OWN. (113, 16)
--------------------------------------------------------------------
Subtree, depth = 13 (113 data points).
Split on feature grade.C. (0, 113)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (113 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (16 data points).
Split on feature grade.C. (0, 16)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (16 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 11 (334 data points).
Split on feature grade.C. (0, 334)
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (334 data points).
Split on feature term. 60 months. (334, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (334 data points).
Split on feature home_ownership.OWN. (286, 48)
--------------------------------------------------------------------
Subtree, depth = 14 (286 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (48 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (2792 data points).
Split on feature emp_length.2 years. (2562, 230)
--------------------------------------------------------------------
Subtree, depth = 11 (2562 data points).
Split on feature emp_length.5 years. (2335, 227)
--------------------------------------------------------------------
Subtree, depth = 12 (2335 data points).
Split on feature grade.C. (0, 2335)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (2335 data points).
Split on feature term. 60 months. (2335, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (2335 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (227 data points).
Split on feature grade.C. (0, 227)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (227 data points).
Split on feature term. 60 months. (227, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (227 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (230 data points).
Split on feature grade.C. (0, 230)
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (230 data points).
Split on feature term. 60 months. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (230 data points).
Split on feature home_ownership.OWN. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (8917 data points).
Split on feature grade.C. (8917, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (8917 data points).
Split on feature term. 60 months. (8917, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (8917 data points).
Split on feature home_ownership.MORTGAGE. (4748, 4169)
--------------------------------------------------------------------
Subtree, depth = 12 (4748 data points).
Split on feature home_ownership.OWN. (4089, 659)
--------------------------------------------------------------------
Subtree, depth = 13 (4089 data points).
Split on feature home_ownership.RENT. (0, 4089)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (4089 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (659 data points).
Split on feature home_ownership.RENT. (659, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (659 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (4169 data points).
Split on feature home_ownership.OWN. (4169, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (4169 data points).
Split on feature home_ownership.RENT. (4169, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (4169 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (28 data points).
Split on feature grade.B. (11, 17)
--------------------------------------------------------------------
Subtree, depth = 9 (11 data points).
Split on feature emp_length.6 years. (10, 1)
--------------------------------------------------------------------
Subtree, depth = 10 (10 data points).
Split on feature grade.C. (0, 10)
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (10 data points).
Split on feature term. 60 months. (10, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (10 data points).
Split on feature home_ownership.MORTGAGE. (10, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (10 data points).
Split on feature home_ownership.OWN. (10, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (10 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (17 data points).
Split on feature emp_length.1 year. (16, 1)
--------------------------------------------------------------------
Subtree, depth = 10 (16 data points).
Split on feature emp_length.3 years. (15, 1)
--------------------------------------------------------------------
Subtree, depth = 11 (15 data points).
Split on feature emp_length.4 years. (14, 1)
--------------------------------------------------------------------
Subtree, depth = 12 (14 data points).
Split on feature emp_length.< 1 year. (13, 1)
--------------------------------------------------------------------
Subtree, depth = 13 (13 data points).
Split on feature grade.C. (13, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (13 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (4799 data points).
Split on feature grade.B. (4799, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (4799 data points).
Split on feature grade.C. (4799, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (4799 data points).
Split on feature term. 60 months. (4799, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (4799 data points).
Split on feature home_ownership.MORTGAGE. (2163, 2636)
--------------------------------------------------------------------
Subtree, depth = 11 (2163 data points).
Split on feature home_ownership.OTHER. (2154, 9)
--------------------------------------------------------------------
Subtree, depth = 12 (2154 data points).
Split on feature home_ownership.OWN. (1753, 401)
--------------------------------------------------------------------
Subtree, depth = 13 (1753 data points).
Split on feature home_ownership.RENT. (0, 1753)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (1753 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (401 data points).
Split on feature home_ownership.RENT. (401, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (401 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (9 data points).
Split on feature emp_length.3 years. (8, 1)
--------------------------------------------------------------------
Subtree, depth = 13 (8 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (2636 data points).
Split on feature home_ownership.OTHER. (2636, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (2636 data points).
Split on feature home_ownership.OWN. (2636, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (2636 data points).
Split on feature home_ownership.RENT. (2636, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (2636 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Split on feature grade.A. (96, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (96 data points).
Split on feature grade.B. (96, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (96 data points).
Split on feature grade.C. (96, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (96 data points).
Split on feature term. 60 months. (96, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (96 data points).
Split on feature home_ownership.MORTGAGE. (44, 52)
--------------------------------------------------------------------
Subtree, depth = 11 (44 data points).
Split on feature emp_length.3 years. (43, 1)
--------------------------------------------------------------------
Subtree, depth = 12 (43 data points).
Split on feature emp_length.7 years. (42, 1)
--------------------------------------------------------------------
Subtree, depth = 13 (42 data points).
Split on feature emp_length.8 years. (41, 1)
--------------------------------------------------------------------
Subtree, depth = 14 (41 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (52 data points).
Split on feature emp_length.2 years. (47, 5)
--------------------------------------------------------------------
Subtree, depth = 12 (47 data points).
Split on feature home_ownership.OTHER. (47, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (47 data points).
Split on feature home_ownership.OWN. (47, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (47 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (5 data points).
Split on feature home_ownership.OTHER. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (5 data points).
Split on feature home_ownership.OWN. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Split on feature home_ownership.OTHER. (701, 1)
--------------------------------------------------------------------
Subtree, depth = 7 (701 data points).
Split on feature grade.B. (317, 384)
--------------------------------------------------------------------
Subtree, depth = 8 (317 data points).
Split on feature grade.C. (1, 316)
--------------------------------------------------------------------
Subtree, depth = 9 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (316 data points).
Split on feature grade.G. (316, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (316 data points).
Split on feature term. 60 months. (316, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (316 data points).
Split on feature home_ownership.MORTGAGE. (189, 127)
--------------------------------------------------------------------
Subtree, depth = 12 (189 data points).
Split on feature home_ownership.OWN. (139, 50)
--------------------------------------------------------------------
Subtree, depth = 13 (139 data points).
Split on feature home_ownership.RENT. (0, 139)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (139 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (50 data points).
Split on feature home_ownership.RENT. (50, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (50 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (127 data points).
Split on feature home_ownership.OWN. (127, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (127 data points).
Split on feature home_ownership.RENT. (127, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (127 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (384 data points).
Split on feature grade.C. (384, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (384 data points).
Split on feature grade.G. (384, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (384 data points).
Split on feature term. 60 months. (384, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (384 data points).
Split on feature home_ownership.MORTGAGE. (210, 174)
--------------------------------------------------------------------
Subtree, depth = 12 (210 data points).
Split on feature home_ownership.OWN. (148, 62)
--------------------------------------------------------------------
Subtree, depth = 13 (148 data points).
Split on feature home_ownership.RENT. (0, 148)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (148 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (62 data points).
Split on feature home_ownership.RENT. (62, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (62 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (174 data points).
Split on feature home_ownership.OWN. (174, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (174 data points).
Split on feature home_ownership.RENT. (174, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (174 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Split on feature grade.B. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (230 data points).
Split on feature grade.C. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (230 data points).
Split on feature grade.G. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (230 data points).
Split on feature term. 60 months. (230, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (230 data points).
Split on feature home_ownership.MORTGAGE. (119, 111)
--------------------------------------------------------------------
Subtree, depth = 11 (119 data points).
Split on feature home_ownership.OTHER. (119, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (119 data points).
Split on feature home_ownership.OWN. (71, 48)
--------------------------------------------------------------------
Subtree, depth = 13 (71 data points).
Split on feature home_ownership.RENT. (0, 71)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (71 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (48 data points).
Split on feature home_ownership.RENT. (48, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (48 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (111 data points).
Split on feature home_ownership.OTHER. (111, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (111 data points).
Split on feature home_ownership.OWN. (111, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (111 data points).
Split on feature home_ownership.RENT. (111, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (111 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (347 data points).
Split on feature grade.B. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (347 data points).
Split on feature grade.C. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (347 data points).
Split on feature grade.G. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (347 data points).
Split on feature term. 60 months. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (347 data points).
Split on feature home_ownership.MORTGAGE. (237, 110)
--------------------------------------------------------------------
Subtree, depth = 11 (237 data points).
Split on feature home_ownership.OTHER. (235, 2)
--------------------------------------------------------------------
Subtree, depth = 12 (235 data points).
Split on feature home_ownership.OWN. (203, 32)
--------------------------------------------------------------------
Subtree, depth = 13 (203 data points).
Split on feature home_ownership.RENT. (0, 203)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (203 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (32 data points).
Split on feature home_ownership.RENT. (32, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (32 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (110 data points).
Split on feature home_ownership.OTHER. (110, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (110 data points).
Split on feature home_ownership.OWN. (110, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (110 data points).
Split on feature home_ownership.RENT. (110, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (110 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Split on feature grade.A. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (9 data points).
Split on feature grade.B. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (9 data points).
Split on feature grade.C. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (9 data points).
Split on feature grade.G. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 10 (9 data points).
Split on feature term. 60 months. (9, 0)
--------------------------------------------------------------------
Subtree, depth = 11 (9 data points).
Split on feature home_ownership.MORTGAGE. (6, 3)
--------------------------------------------------------------------
Subtree, depth = 12 (6 data points).
Split on feature home_ownership.OTHER. (6, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (6 data points).
Split on feature home_ownership.RENT. (0, 6)
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (3 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (1276 data points).
Split on feature grade.F. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (1276 data points).
Split on feature grade.G. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (1276 data points).
Split on feature term. 60 months. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (1276 data points).
Split on feature home_ownership.MORTGAGE. (855, 421)
--------------------------------------------------------------------
Subtree, depth = 10 (855 data points).
Split on feature home_ownership.OTHER. (849, 6)
--------------------------------------------------------------------
Subtree, depth = 11 (849 data points).
Split on feature home_ownership.OWN. (737, 112)
--------------------------------------------------------------------
Subtree, depth = 12 (737 data points).
Split on feature home_ownership.RENT. (0, 737)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (737 data points).
Split on feature emp_length.1 year. (670, 67)
--------------------------------------------------------------------
Subtree, depth = 14 (670 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (67 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (112 data points).
Split on feature home_ownership.RENT. (112, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (112 data points).
Split on feature emp_length.1 year. (102, 10)
--------------------------------------------------------------------
Subtree, depth = 14 (102 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (10 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (6 data points).
Split on feature home_ownership.OWN. (6, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (6 data points).
Split on feature home_ownership.RENT. (6, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (6 data points).
Split on feature emp_length.1 year. (6, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (6 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (421 data points).
Split on feature emp_length.6 years. (408, 13)
--------------------------------------------------------------------
Subtree, depth = 11 (408 data points).
Split on feature home_ownership.OTHER. (408, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (408 data points).
Split on feature home_ownership.OWN. (408, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (408 data points).
Split on feature home_ownership.RENT. (408, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (408 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (13 data points).
Split on feature home_ownership.OTHER. (13, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (13 data points).
Split on feature home_ownership.OWN. (13, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (13 data points).
Split on feature home_ownership.RENT. (13, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (13 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Split on feature grade.F. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 7 (4701 data points).
Split on feature grade.G. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 8 (4701 data points).
Split on feature term. 60 months. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 9 (4701 data points).
Split on feature home_ownership.MORTGAGE. (3047, 1654)
--------------------------------------------------------------------
Subtree, depth = 10 (3047 data points).
Split on feature home_ownership.OTHER. (3037, 10)
--------------------------------------------------------------------
Subtree, depth = 11 (3037 data points).
Split on feature home_ownership.OWN. (2633, 404)
--------------------------------------------------------------------
Subtree, depth = 12 (2633 data points).
Split on feature home_ownership.RENT. (0, 2633)
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (2633 data points).
Split on feature emp_length.1 year. (2392, 241)
--------------------------------------------------------------------
Subtree, depth = 14 (2392 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (241 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 12 (404 data points).
Split on feature home_ownership.RENT. (404, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (404 data points).
Split on feature emp_length.1 year. (374, 30)
--------------------------------------------------------------------
Subtree, depth = 14 (374 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (30 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (10 data points).
Split on feature home_ownership.OWN. (10, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (10 data points).
Split on feature home_ownership.RENT. (10, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (10 data points).
Split on feature emp_length.1 year. (9, 1)
--------------------------------------------------------------------
Subtree, depth = 14 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (1 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 10 (1654 data points).
Split on feature emp_length.5 years. (1532, 122)
--------------------------------------------------------------------
Subtree, depth = 11 (1532 data points).
Split on feature emp_length.3 years. (1414, 118)
--------------------------------------------------------------------
Subtree, depth = 12 (1414 data points).
Split on feature emp_length.9 years. (1351, 63)
--------------------------------------------------------------------
Subtree, depth = 13 (1351 data points).
Split on feature home_ownership.OTHER. (1351, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (1351 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (63 data points).
Split on feature home_ownership.OTHER. (63, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (63 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (118 data points).
Split on feature home_ownership.OTHER. (118, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (118 data points).
Split on feature home_ownership.OWN. (118, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (118 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 11 (122 data points).
Split on feature home_ownership.OTHER. (122, 0)
--------------------------------------------------------------------
Subtree, depth = 12 (122 data points).
Split on feature home_ownership.OWN. (122, 0)
--------------------------------------------------------------------
Subtree, depth = 13 (122 data points).
Split on feature home_ownership.RENT. (122, 0)
--------------------------------------------------------------------
Subtree, depth = 14 (122 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 14 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 13 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 12 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 9 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 8 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 7 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.

Evaluating the models

Let us evaluate the models on the train and validation data. Let us start by evaluating the classification error on the training data:

In [43]:
print "Training data, classification error (model 1): %.4f" % \
    evlClassificationError(depthMax02DTree, glbObsFit)
print "Training data, classification error (model 2): %.4f" % \
    evlClassificationError(depthMax06DTree, glbObsFit)
print "Training data, classification error (model 3): %.4f" % \
    evlClassificationError(depthMax14DTree, glbObsFit)
Training data, classification error (model 1): 0.4000
Training data, classification error (model 2): 0.3819
Training data, classification error (model 3): 0.3745

Now evaluate the classification error on the validation data.

In [44]:
print "Validation data, classification error (model 1): %.4f" % \
    evlClassificationError(depthMax02DTree, glbObsOOB)
print "Validation data, classification error (model 2): %.4f" % \
    evlClassificationError(depthMax06DTree, glbObsOOB)
print "Validation data, classification error (model 3): %.4f" % \
    evlClassificationError(depthMax14DTree, glbObsOOB)
Validation data, classification error (model 1): 0.3981
Validation data, classification error (model 2): 0.3838
Validation data, classification error (model 3): 0.3800

Quiz Question: Which tree has the smallest error on the validation data?

Quiz Question: Does the tree with the smallest error in the training data also have the smallest error in the validation data?

Quiz Question: Is it always true that the tree with the lowest classification error on the training set will result in the lowest classification error in the validation set?

Measuring the complexity of the tree

Recall in the lecture that we talked about deeper trees being more complex. We will measure the complexity of the tree as

  complexity(T) = number of leaves in the tree T

Here, we provide a function getNLeaves that counts the number of leaves in a tree. Using this implementation, compute the number of nodes in depthMax02DTree, depthMax06DTree, and depthMax14DTree.

In [46]:
def getNLeaves(tree):
    if tree['isLeaf']:
        return 1
    return getNLeaves(tree['lft']) + getNLeaves(tree['rgt'])

Compute the number of nodes in depthMax02DTree, depthMax06DTree, and depthMax14DTree.

In [47]:
print(getNLeaves(depthMax02DTree))
print(getNLeaves(depthMax06DTree))
print(getNLeaves(depthMax14DTree))
4
41
341

Quiz question: Which tree has the largest complexity?

Quiz question: Is it always true that the most complex tree will result in the lowest classification error in the glbObsOOB?

Exploring the effect of min_error

We will compare three models trained with different values of the stopping criterion. We intentionally picked models at the extreme ends (negative, just right, and too positive).

Train three models with these parameters:

  1. errRedMinM1DTree: errReductionMin = -1 (ignoring this early stopping condition)
  2. errRedMinS0DTree: errReductionMin = 0 (just right)
  3. errRedMinP5DTree: errReductionMin = 5 (too positive)

For each of these three, we set depthMax = 6, and nodeSizeMin = 0.

Note: Each tree can take up to 30 seconds to train.

In [48]:
errRedMinM1DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
--------------------------------------------------------------------
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
--------------------------------------------------------------------
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
In [49]:
errRedMinS0DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 0, errReductionMin =  0)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Early stopping condition 3 reached. Minimum error reduction.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Early stopping condition 3 reached. Minimum error reduction.
In [50]:
errRedMinP5DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 0, errReductionMin = +5)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Early stopping condition 3 reached. Minimum error reduction.

Calculate the accuracy of each model (errRedMinM1DTree, errRedMinS0DTree, or errRedMinP5DTree) on the validation set.

In [52]:
print "Validation data, classification error (model 4): %.4f" % \
    evlClassificationError(errRedMinM1DTree, glbObsOOB)
print "Validation data, classification error (model 5): %.4f" % \
    evlClassificationError(errRedMinS0DTree, glbObsOOB)
print "Validation data, classification error (model 6): %.4f" % \
    evlClassificationError(errRedMinP5DTree, glbObsOOB)    
Validation data, classification error (model 4): 0.3838
Validation data, classification error (model 5): 0.3838
Validation data, classification error (model 6): 0.5034
In [ ]:
# print "Validation data, classification error (model 4):", evlClassificationError(errRedMinM1DTree, glbObsOOB)
# print "Validation data, classification error (model 5):", evlClassificationError(errRedMinS0DTree, glbObsOOB)
# print "Validation data, classification error (model 6):", evlClassificationError(errRedMinP5DTree, glbObsOOB)

Using the getNLeaves function, compute the number of leaves in each of each models in (errRedMinM1DTree, errRedMinS0DTree, and errRedMinP5DTree).

In [53]:
print(getNLeaves(errRedMinM1DTree))
print(getNLeaves(errRedMinS0DTree))
print(getNLeaves(errRedMinP5DTree))
41
13
1

Quiz Question: Using the complexity definition above, which model (errRedMinM1DTree, errRedMinS0DTree, or errRedMinP5DTree) has the largest complexity?

Did this match your expectation?

Quiz Question: errRedMinM1DTree and errRedMinS0DTree have similar classification error on the validation set but errRedMinS0DTree has lower complexity? Should you pick errRedMinS0DTree over errRedMinM1DTree?

Exploring the effect of nodeSizeMin

We will compare three models trained with different values of the stopping criterion. Again, intentionally picked models at the extreme ends (too small, just right, and just right).

Train three models with these parameters:

  1. nodeSizeMin0e0DTree: nodeSizeMin = 0 (too small)
  2. nodeSizeMin2e3DTree: nodeSizeMin = 2000 (just right)
  3. nodeSizeMin5e4DTree: nodeSizeMin = 50000 (too large)

For each of these three, we set depthMax = 6, and errReductionMin = -1.

Note: Each tree can take up to 30 seconds to train.

In [54]:
nodeSizeMin0e0DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 0e0, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Split on feature emp_length.5 years. (969, 79)
--------------------------------------------------------------------
Subtree, depth = 4 (969 data points).
Split on feature grade.C. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (969 data points).
Split on feature grade.D. (969, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (969 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (79 data points).
Split on feature home_ownership.MORTGAGE. (34, 45)
--------------------------------------------------------------------
Subtree, depth = 5 (34 data points).
Split on feature grade.C. (34, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (34 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (45 data points).
Split on feature grade.C. (45, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (45 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Split on feature emp_length.n/a. (96, 5)
--------------------------------------------------------------------
Subtree, depth = 3 (96 data points).
Split on feature emp_length.< 1 year. (85, 11)
--------------------------------------------------------------------
Subtree, depth = 4 (85 data points).
Split on feature grade.B. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (85 data points).
Split on feature grade.C. (85, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (85 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (11 data points).
Split on feature grade.B. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature grade.C. (11, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (11 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (5 data points).
Split on feature grade.B. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (5 data points).
Split on feature grade.C. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (5 data points).
Split on feature grade.D. (5, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (5 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Split on feature grade.A. (702, 230)
--------------------------------------------------------------------
Subtree, depth = 6 (702 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (230 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Split on feature emp_length.8 years. (347, 11)
--------------------------------------------------------------------
Subtree, depth = 5 (347 data points).
Split on feature grade.A. (347, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (347 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (11 data points).
Split on feature home_ownership.OWN. (9, 2)
--------------------------------------------------------------------
Subtree, depth = 6 (9 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Split on feature grade.A. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (1276 data points).
Split on feature grade.B. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (1276 data points).
Split on feature grade.C. (1276, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (1276 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
In [55]:
nodeSizeMin2e3DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 2e3, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Split on feature term. 36 months. (9223, 28001)
--------------------------------------------------------------------
Subtree, depth = 1 (9223 data points).
Split on feature grade.A. (9122, 101)
--------------------------------------------------------------------
Subtree, depth = 2 (9122 data points).
Split on feature grade.B. (8074, 1048)
--------------------------------------------------------------------
Subtree, depth = 3 (8074 data points).
Split on feature grade.C. (5884, 2190)
--------------------------------------------------------------------
Subtree, depth = 4 (5884 data points).
Split on feature grade.D. (3826, 2058)
--------------------------------------------------------------------
Subtree, depth = 5 (3826 data points).
Split on feature grade.E. (1693, 2133)
--------------------------------------------------------------------
Subtree, depth = 6 (1693 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (2133 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (2058 data points).
Split on feature grade.E. (2058, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2058 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (2190 data points).
Split on feature grade.D. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (2190 data points).
Split on feature grade.E. (2190, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (2190 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (1048 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 2 (101 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 1 (28001 data points).
Split on feature grade.D. (23300, 4701)
--------------------------------------------------------------------
Subtree, depth = 2 (23300 data points).
Split on feature grade.E. (22024, 1276)
--------------------------------------------------------------------
Subtree, depth = 3 (22024 data points).
Split on feature grade.F. (21666, 358)
--------------------------------------------------------------------
Subtree, depth = 4 (21666 data points).
Split on feature emp_length.n/a. (20734, 932)
--------------------------------------------------------------------
Subtree, depth = 5 (20734 data points).
Split on feature grade.G. (20638, 96)
--------------------------------------------------------------------
Subtree, depth = 6 (20638 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (96 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 5 (932 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 4 (358 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 3 (1276 data points).
Early stopping condition 2 reached. Reached minimum node size.
--------------------------------------------------------------------
Subtree, depth = 2 (4701 data points).
Split on feature grade.A. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 3 (4701 data points).
Split on feature grade.B. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 4 (4701 data points).
Split on feature grade.C. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 5 (4701 data points).
Split on feature grade.E. (4701, 0)
--------------------------------------------------------------------
Subtree, depth = 6 (4701 data points).
Early stopping condition 1 reached. Reached maximum depth.
--------------------------------------------------------------------
Subtree, depth = 6 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 5 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 4 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
--------------------------------------------------------------------
Subtree, depth = 3 (0 data points).
Stopping condition 1 reached. All data points have the same target value.
In [56]:
nodeSizeMin5e4DTree = bldDecisionTree(glbObsFit, features, 'safe_loan', depthMax = 06, 
                                    nodeSizeMin = 5e4, errReductionMin = -1)
--------------------------------------------------------------------
Subtree, depth = 0 (37224 data points).
Early stopping condition 2 reached. Reached minimum node size.

Now, let us evaluate the models (nodeSizeMin0e0DTree, nodeSizeMin2e3DTree, or nodeSizeMin5e4DTree) on the glbObsOOB.

In [58]:
print "Validation data, classification error (model 7): %.4f" % \
    evlClassificationError(nodeSizeMin0e0DTree, glbObsOOB)
print "Validation data, classification error (model 8): %.4f" % \
    evlClassificationError(nodeSizeMin2e3DTree, glbObsOOB)
print "Validation data, classification error (model 9): %.4f" % \
    evlClassificationError(nodeSizeMin5e4DTree, glbObsOOB)    
Validation data, classification error (model 7): 0.3838
Validation data, classification error (model 8): 0.3845
Validation data, classification error (model 9): 0.5034

Using the getNLeaves function, compute the number of leaves in each of each models (nodeSizeMin0e0DTree, nodeSizeMin2e3DTree, and nodeSizeMin5e4DTree).

In [59]:
print(getNLeaves(nodeSizeMin0e0DTree))
print(getNLeaves(nodeSizeMin2e3DTree))
print(getNLeaves(nodeSizeMin5e4DTree))
41
19
1

Quiz Question: Using the results obtained in this section, which model (nodeSizeMin0e0DTree, nodeSizeMin2e3DTree, or nodeSizeMin5e4DTree) would you choose to use?